The costs and impacts of government corruption range from impairing a country's economic growth to affecting its citizens' well-being and safety. Public contracting between government dependencies and private sector instances, referred to as public procurement, is a fertile land of opportunity for corrupt practices, generating substantial monetary losses worldwide. Thus, identifying and deterring corrupt activities between the government and the private sector is paramount. However, due to several factors, corruption in public procurement is challenging to identify and track, leading to corrupt practices going unnoticed. This paper proposes a machine learning model based on an ensemble of random forest classifiers, which we call hyper-forest, to identify and predict corrupt contracts in M\'exico's public procurement data. This method's results correctly detect most of the corrupt and non-corrupt contracts evaluated in the dataset. Furthermore, we found that the most critical predictors considered in the model are those related to the relationship between buyers and suppliers rather than those related to features of individual contracts. Also, the method proposed here is general enough to be trained with data from other countries. Overall, our work presents a tool that can help in the decision-making process to identify, predict and analyze corruption in public procurement contracts.
translated by 谷歌翻译
We introduce hp-greedy, a refinement approach for building gravitational wave surrogates as an extension of the standard reduced basis framework. Our proposal is data-driven, with a domain decomposition of the parameter space, local reduced basis, and a binary tree as the resulting structure, which are obtained in an automated way. When compared to the standard global reduced basis approach, the numerical simulations of our proposal show three salient features: i) representations of lower dimension with no loss of accuracy, ii) a significantly higher accuracy for a fixed maximum dimensionality of the basis, in some cases by orders of magnitude, and iii) results that depend on the reduced basis seed choice used by the refinement algorithm. We first illustrate the key parts of our approach with a toy model and then present a more realistic use case of gravitational waves emitted by the collision of two spinning, non-precessing black holes. We discuss performance aspects of hp-greedy, such as overfitting with respect to the depth of the tree structure, and other hyperparameter dependences. As two direct applications of the proposed hp-greedy refinement, we envision: i) a further acceleration of statistical inference, which might be complementary to focused reduced-order quadratures, and ii) the search of gravitational waves through clustering and nearest neighbors.
translated by 谷歌翻译
Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to maintain a large-scale model trained on growing annotation sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a neural network effectively learns relevant patterns for new (unseen) classes without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages the extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
translated by 谷歌翻译
Quality management and assurance is key for space agencies to guarantee the success of space missions, which are high-risk and extremely costly. In this paper, we present a system to generate quizzes, a common resource to evaluate the effectiveness of training sessions, from documents about quality assurance procedures in the Space domain. Our system leverages state of the art auto-regressive models like T5 and BART to generate questions, and a RoBERTa model to extract answers for such questions, thus verifying their suitability.
translated by 谷歌翻译
We present SpaceQA, to the best of our knowledge the first open-domain QA system in Space mission design. SpaceQA is part of an initiative by the European Space Agency (ESA) to facilitate the access, sharing and reuse of information about Space mission design within the agency and with the public. We adopt a state-of-the-art architecture consisting of a dense retriever and a neural reader and opt for an approach based on transfer learning rather than fine-tuning due to the lack of domain-specific annotated data. Our evaluation on a test set produced by ESA is largely consistent with the results originally reported by the evaluated retrievers and confirms the need of fine tuning for reading comprehension. As of writing this paper, ESA is piloting SpaceQA internally.
translated by 谷歌翻译
本文提出了2022年访问量的挑战的最终结果。 OOV竞赛介绍了一个重要方面,而光学角色识别(OCR)模型通常不会研究,即,在培训时对看不见的场景文本实例的识别。竞赛编制了包含326,385张图像的公共场景文本数据集的集合,其中包含4,864,405个场景文本实例,从而涵盖了广泛的数据分布。形成了一个新的独立验证和测试集,其中包括在训练时出词汇量不超出词汇的场景文本实例。竞争是在两项任务中进行的,分别是端到端和裁剪的文本识别。介绍了基线和不同参与者的结果的详尽分析。有趣的是,在新研究的设置下,当前的最新模型显示出显着的性能差距。我们得出的结论是,在此挑战中提出的OOV数据集将是要探索的重要领域,以开发场景文本模型,以实现更健壮和广义的预测。
translated by 谷歌翻译
本文提出了一种通过深层插件(PNP)方法恢复数字视频的新方法。在贝叶斯形式主义下,该方法包括在交替的优化方案中使用深度卷积的降级网络代替先前的近端操作员。我们通过直接应用该方法来恢复降级视频观察结果的数字视频,从而将自己与先前的PNP工作区分开来。这样,可以将经过验证训练的网络重新用于其他视频修复任务。我们在视频脱张,超分辨率和随机缺失像素的插值方面的实验都显示出明显的好处,因为它使用专门为视频denoising设计的网络,因为它可以产生更好的恢复性能和更好的时间稳定性。使用相同的PNP公式。此外,我们的方法比较比较在序列的每个帧上分别应用不同的最新PNP方案。这在视频修复领域打开了新的观点。
translated by 谷歌翻译
高参数优化(HPO)是一个良好的研究领域。但是,HPO管道中组件的效果和相互作用尚未得到很好的研究。然后,我们问自己:HPO的景观是否会被用于评估单个配置的管道偏见吗?为了解决这个问题,我们建议使用健身景观分析分析HPO管道对HPO问题的影响。特别是,我们研究了DS-2019 HPO基准数据集,寻找可能表明评估管道故障的模式,并将其与HPO性能联系起来。我们的主要发现是:(i)在大多数情况下,大量不同的超参数(即多种配置)产生相同的不良绩效,很可能与多数类预测模型有关; (ii)在这些情况下,观察到观察到的健康和平均健身之间存在恶化的相关性,可能会使基于本地搜索的HPO策略的部署更加困难。最后,我们得出的结论是,HPO管道定义可能会对HPO景观产生负面影响。
translated by 谷歌翻译
许多基于经典和学习的光流方法依赖于层次结构概念来提高准确性和鲁棒性。但是,目前最成功的方法之一 - 筏 - 几乎无法利用这种概念。在这项工作中,我们表明多尺度的想法仍然很有价值。更确切地说,使用筏作为基线,我们提出了一个新型的多尺度神经网络,该神经网络将几个分层概念结合在单个估计框架中。这些概念包括(i)部分共享的粗到精细体系结构,(ii)多尺度功能,(iii)层次成本量和(iv)多尺度的多透明损失。MPI Sintel和Kitti的实验清楚地证明了我们方法的好处。与筏相比,它们不仅显示出实质性的改进,而且还显示出最先进的结果,尤其是在非封闭区域中。代码将在https://github.com/cv-stuttgart/ms_raft上找到。
translated by 谷歌翻译
机器学习中的模型选择(ML)是贝叶斯学习程序的关键部分。模型选择可能会对由此产生的预测施加强大的偏见,这可能会阻碍贝叶斯神经网络和神经采样器等方法的性能。另一方面,贝叶斯ML的新提出的方法具有隐式随机过程(高斯过程的概括)的功能空间中近似推断的特征。在这方面,稀疏隐式过程(SIP)的方法特别成功,因为它是完全可训练的,并且可以实现灵活的预测。在这里,我们扩展了原始实验,以表明当数据生成机制与模型所隐含的机制大不相同时,SIP能够纠正模型偏差。我们使用合成数据集证明SIP能够提供预测性分布,这些分布比初始模型的初始模型的确切预测更好地反映了数据。
translated by 谷歌翻译